International Journal of Engineering and Technical Research (IJETR) 
ISSN: 2321-0869, Volume-3, Issue-5, May 2015 


Design and Implementation of TSEMAC and UDP/IP 

Network Stack on FPGA 

Bhavika A. Vithalapara, Abhimanyu Dhiman, Sudhir Agrawal, Shailendrasinh Parmar 


Abstract — This paper presents a high speed FPGA 
implementation of Triple Speed Ethernet MAC (TSEMAC) and 
UDP/IP stack which is the stack of the widely used in transport 
streaming and video conferencing applications. The Physical 
layer and its interface to the FPGA’s IO blocks are implemented 
off-the-shelf using an integrated Ethernet transreceiver 
(National DP83848C). The link layer is based on Altera Triple 
Speed Ethernet MAC core. A novel architecture of network and 
transport layers by means of fabric and dedicated FPGA blocks 
is also proposed which can provides a PC-FPGA and vice-versa 
data communication. The design is analyze and tested using 
software program running in PC which send and receive data. 
The proposed system shows a noticeable speed up suitable for 
FPGA based data streaming applications. 

Index Terms— TSEMAC\ UDP/IP stack , FPGA, PER Test 
Tool. 


I. INTRODUCTION 

Ethernet is synonymous with networking and its application is 
ubiquitous worldwide. It provides high bandwidth over long 
cable length and a driver-less architecture within an operating 
system environment [1]. It is a very popular and commonly 
used standard when connecting to a Internet or LAN(Local 
Area Network). In the past when connecting an system to a 
LAN, it was nessesary to use additional network circuits that 
had more functionality than required which came at higher 
cost [2]. Furthermore, to implement the network stack a 
processor was needed. Now, with the FPGA technology it is 
easy to implement an application-tailored part of a UDP/IP 
stack to achieve a cost-effective and straight-forword 
connection to a network. 

The UDP/IP protocol has applications on audio and video 
streaming of VoIP, broadcasting, multimedia and video 
conference communications [3]. For real time conversations, 
UDP protocol provides low delay datagram transfers, due to 
unreliable service. 

This work presents a TSEMAC and UDP/IP network stack 
implementation on FPGA and make a comparison with other 
exiting work. Here the packets are shown in wireshark to 
integrate with the application layer. This paper is organized as 
follows: Section II presents an overview of the OSI model 
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layers. The proposed system and implementation is detailed 
on Section III, while evaluation and results are exposed in 
Section IV. Finally Section V conclude the whole work. 

II. OSI Model 

The OSI(Open Systems Interconnection) model is a 
theoretical model and is used to describe the behavior of a 
network, the OSI model consists of seven layers and the layers 
are named: Application, Presentation, Session, Transport, 
Network, Datalink and Physical layer. From a TCP/UDP/IP 
viewpoint the session and presentation layers are often 
included in the Application layer.The OSI layers are 
frequently referred in this paper, but it’s not further explained. 
For a detail description of the protocols and layers, see [4]. 

A. Data Link Layer 

The link layer provides the data exchange between computers 
within a local network. The basic unit of data transfer, for data 
link layer is data link packet frame. A frame is composed of a 
header, payload and trailer. It carries the souce address, 
destination address and other control information in the 
header. The trailer contains the checksum of transported data. 
By using the checksum, we can find out the error which has 
been occurred during transfer. The network layer packet is 
included in the payload. 

B. Network Layer 

The network layer is responsible for host to host delivery of 
packet. It establish the route between originating and 
destination computer. The basic unit of data transfer is a 
datagram that is encapsulated in a frame. It is composed of a 
header and data field. This datagrams are the payload or data 
field of the frame. This layer is used to establish 
communication with computer systems that lie beyond the 
local LAN segment, because it has its own routing addressing 
architecture, which is separate and distinct from the link layer. 
Such protocols are known as routable protocols, for example 
IP (Internet Protocol) [5]. The IP is the important protocol of 
the Network layer. 

C. Transport Layer 

The transport layer is responsible for communication between 
two applications running on different computers [6]. The 
basic transmission unit is a segment, that is composed of a 
header and payload, which is transmitted through payload of 
network packet. The transport layer provides end-to-end 
reliability by having flow and error control. It has two 
protocol: TCP (Transmission Control Protocol) and UDP 
(User Datagram Protocol). TCP provides a 
connection-oriented communication with flow control, 
reliable data delivery and duplicate data suppression. Wheres 
UDP provides unreliable and connection less communication 
services. It allows applications to send datagram and handle 
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translation between sockets and ports. UDP is very effective 
for real-time applications like audio and video or in 
applications where low delay and low latency is preffered 
over reliable data delivery [7]. 

III. The Proposed system and Implementation 

The Hardware UDP/IP stack core is shown in Figure 1. Figure 
1(a) shows the architecture of a traditional TCP/IP stack and 
Figure 1(b) shows the UDP/IP stack which are entirely 
implemented on FPGA. 


Application layer 

Application layer 

Transport layer 

UDP 

Network layer 

IPv4 

Link layer 

Ethernet 

Physical layer 

Physical Medium 


Figure 1(a) Architecture of TCP/IP stack, (b) used architecture 


In our design, UDP protocol is implemented at transport 
layer. Internet Protocol Version 4(IPv4) is used at network 
layer, which gives more area effective design compared to 
IPv6 protocol. 

The architecture of proposed design is shown in Figure 2. It 
consists of several modules. Starting with the physical layer, 
DP83848C PHY device can be used as an interface for 
Ethernet communication at 10/100 Mb/s speed. PHY connect 
to Ethernet cable through RJ-45 connector. Mil interface is 
defined by IEEE 802.3 specification. When using a Mil 
interface, PHY chip provides tx and rx clock to MAC. At Mil 
interface that operate at 10 Mb/s, PHY provides 2.5 MHz 
clock and for 100 Mb/s speed it provides 25 MHz clock. 

Triple Speed Ethernet MAC Megacore function is offered by 
Altera with high capability and in variety of operating modes. 
This function is configured to operate at half-duplex or 
full-duplex with minimum possible resource consumption and 
suitable for data streaming applications. PHY interface is 
managed by MDIO module. In our design we are using Mil 
interface in full-duplex mode with 100 Mb/s speed. TSEMAC 
features and operation are illustrated in [8]. 

The Ethernet MAC function provides an Avalon-ST interface 
to user applications and an industry standard interface to 
external PHY devices. It can support upto 24 ports. We can 
configure single-port MAC to include internal FIFO buffers 
to store the data on transmit and receive path [8]. PHY 
provides a 25 MHz clock to TSEMAC. The PHY is managed 
by MDIO(Management Data Input/Output) module which 
uses the Management Data Clock (MDC) of 2.5MHz. 

The Megacore function is constructed from two units, the 
transmitter and receiver unit. On the transmit path, data 
transfer are synchronous to the rising egde of tx_clk. The 
tx_en signal is asserted to indicate the start of a new frame and 
remains high until the last byte of frame is present on 
tx_d[3:0]. If an error occurred in the frame during 


transmission, it subsequently transmitted with the tx_err for 
one clock cycle. 


Ethernet Cable 



Figure 2 Proposed Design Architecture 

On the receiver path, all signals are sampled on rising edge of 
rx_clk. The PHY assert the rx_en signal to indicate the start of 
a new frame and remains high until the last byte is present on 
rx_d[3:0]. If there is an error in the frame received from the 
line, the PHY assert the rx_err for one clock cycle. 

IV. Evaluation and Results 

The Hardware system is designed in QSYS using Triple 
Speed Ethernet Megacore function. There are three systems: 
Ethernet main system, Ethernet subsystem and Peripheral 
system. Ethernet main system consists of NIOS II processor 
which control and configure all the modules, SDRAM which 
is use to store the data, flash memory to store the MAC 
address of device and peripheral and Ethernet interface 
systems. Ethernet subsystem consists of Ethernet bridge to 
connect this system with main system, TSEMAC to configure, 
control and access the PHY device with different speed and 
modes, SGDMA(Scatter Gathered DMA) tx/rx to send and 
receive data to and from FIFO of TSEMAC, and descriptor 
memory to store the data of DMA controller. 

The peripheral system consisting of peripheral bridge to 
connect this system with main system, performance counter 
for debug and system performance analysis, system clock 
timer, and JTAG UART for serial communication and 
debugging NIOS II application via on-board USB-blaster 
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circuitry. Here the system ID block is used to sync the 
hardware system generation with software generation tools. 
We uses one PLL(Phase Lock Loop) in our design which 
accept input clock of 50 MHz and generate 100 MHz system 
clock, 100 MHz SSRAM clock, 60 MHz peripheral clock etc. 
The software interface is provided by the Eclipse which is 
software tool for NIOS II processor. To transmit data 
SGDMA tx controller are open, assert tx_en signal and send 
data. Figure 3 shows the tx data view of signal tap analyzer 
which is real time analyser for system. 
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Figure 3 Tx data in signal tap analyzer 


When frame is coming on receiver side PHY assert the rx_en 
signal and NIOS access this data through SGDMA rx 
controller. Figure 4 shows rx data in signal tap analyzer. 


To check the UDP functionality we send the UDP packets to 
the device by PER (Packet Error Rate) test tools which 
measure the packet error rate as shown in figure 6, and set the 
local loopback on the device. 

ff|f PERTool.vi |-o-|l-{°3 |U£^| 



Figure 6 PER Test Tool 

In the PER test tool we set the remote IP to FPGA device IP 
that is 192.168.10.234, and remote rx port to 30 because we 



Figure 4 Rx data in signal tap analyzer 

We implement nichestack which is a networking stack. It 
receive data from application and transform it into the 
network specific packets and send to the networking devices. 
Here the IP is assigned to the device which is 192.168.10.234 
for our device and UDP port Number assigned is 30. Now 
when ping the device it gives the response as shown in figure 
5 by ICMP packets. 

Protocol Length Info 


8769 2896, 

8873 2897, 

8874 2897, 

8977 2898, 

8978 2898, 

9081 2899, 

9082 2899, 

9343 2902, 

9344 2902, 

9448 2903, 

9449 2903, 

9554 2904, 

9555 2904, 

9657 2905, 

9658 2905, 

9884 2907, 

9885 2907, 


455330000 

464371000 

464803000 

472475000 

472902000 

480510000 

480925000 

050542000 

050934000 

058562000 

058911000 

086730000 

087153000 

094772000 

095149000 

302531000 

302934000 
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ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 

ICMP 
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74 Echo (ping] 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 
74 Echo (ping) 


reply 

request 

reply 

request 

reply 

request 

reply 

request 

reply 

request 

reply 

request 

reply 

request 

reply 

request 

reply 


id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 
id=0x0001 


t Frame 8768: 74 bytes on wire (592 bits), 74 bytes captured (592 bits) on interface 0 
B Ethernet II, src: 90:lb:0e:0c:2b:e9 (90:lb:0e:0c:2b:e9), Dst: 00:07:ed:ff:cd:15 (00:07:ed:ff:cd:15) 

0 Destination: 00:07:ed:ff:cd:15 (00:07:ed:ff:cd:15) 

B source: 90:lb:Oe:0c:2b:e9 (90:lb:0e:0c:2b:e9) 

Type: IP (0x0800) 

± internet Protocol Version 4, Src: 192.168.10.235 (192.168.10.235), Dst: 192.168.10.234 (192.168.10.234) 
0 internet control Message Protocol 


Figure 5 ICMP packets of ping request and reply 


set the listen port of the device to 30. Local rx and tx port are 
set to port no. 1234 and 1235 of host PC. Data rate is set to 
2Mbps and packet length is 1200 byte. Now when 
Transreceiver is turn on it send and receive the packets from 
the device. Here it shows that among 27155 received packet 
11 packets are lost. So packet error rate is very low. Figure 7 
shows the UDP packets communication in wireshark. 


InteHR) B50 Gigabit Netwoik Connection: \Device\NPfJ792184BC-6EA4-43C4-ABB7-F2S8E6FBE677} [Wiresharkm (SVN Rev 46250 from/tnink-18)] 
File Edit View Go Capture Analyze Statistics Telephony loots Internals Help 

iil» * B S X 3 e Hi AAA e gai, s 

» Expression... Clear AppK Save 


Filter ip.src eq 19216810234 

Ho. Time Source 

6 30.2546360192.168.10.234 

7 30.2558300 192.168.10.234 
9 30.2604540192.168.10.234 

11 30.2655450192.168.10.234 
13 30.2705130 192.168.10.234 
15 30.2755150 192.168.10.234 
17 30.2804100 192.168.10.234 
19 30.2854260 192.168.10.234 
21 30.2903970192.168.10.234 
23 30.2954470192.168.10.234 
25 30.3003910192.168.10.234 
27 30.3053580 192.168.10.234 
29 30.3103990 192.168.10.234 
31 30.3154130 192.168.10.234 
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192.168. 

192.168. 
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10.235 
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10.235 
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UDP 


1242 source port: 
1242 source port: 
1242 source port: 
1242 source port: 
1242 source port: 
1242 Source port: 
1242 source port: 
1242 Source port: 
1242 source port: 
1242 source port: 
1242 source port: 
1242 source port: 
1242 source port: 
1242 source port: 


30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
30 Destination port: 
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nosaicsyssvc 
nosaicsyssvc 
nosaicsyssvc 
nosaicsyssvc 
nosaicsyssvc 
nosaicsyssvc 
nosaicsyssvc 
nosaicsyssvc 
nosaicsyssvc 
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* Frame 39: 1242 bytes on wire (9936 bits), 1242 bytes captured (9936 bits) on interface 0 

• Ethernet II, Src: Altera_ff:cd:15 (00:07:ed:ff:cd:15), Dst: FujitsuT_0c:2b:e9 (90:lb:0e:0c:2b:e9) 
internet Protocol version 4, Src: 192.168.10.234 (192.168.10.234), Dst: 192.168.10.235 (192.168.10.235) 

version: 4 

Header length: 20 bytes 

a Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECU: 0x00: Not-ECT (Not ECN-Capable Transport)) 
Total Length: 1228 
Identification: 0x0016 (22) 
a Flaps: 0x00 


0000 90 lb Oe Oc 2b e9 00 07 ed ff cd 15 08 00 45 00 

0010 04 cc 00 16 00 00 40 11 de e5 cO a8 0a ea cO a8 
0020 0a eb 00 le 04 d3 04 b8 af aa 00 00 00 12 aa aa 
0030 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa 
0040 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa 
0050 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa 


..E. 


Figure 6 UDP data packets 

We compare our work in terms of speed(MHz), Maximum 
Ethernet frame length (bytes) and maximum Ethernet speed 
(Mbps) with other implementations and we get the results that 
is summarized in TABLE I. They all implement the stack in 
Xilinx devices wheres we implement it in Altera device. In [2] 
they provide three different UDP/IP core: Minimum, Medium 
and Advanced. [9] presents complete TCP/IP stack 
implementation from which we observed the data only related 
to UDP/IP implementation. In [10], it only implement the 
Ethernet stack and [11] presents the complete UDP/IP core 
but there is neighter UDP checksum capability nor ARP 
protocol is considered. 
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TABLE I 


Ref.No 

Speed(MHz) 

Max. Frame 

Length 

(bytes) 

Max. 

Ethernet 

Speed 

(Mbps) 

Our 

125 

1518 

1000 

2-Min. 

90.7 

256 

100 

2-Med. 

60.3 

256 

100 

2-Adv. 

105.6 

1518 

1000 

9-Dollas 

77 

- 

100 

10 

50 

- 

100 


In speed term our work gives the best performance. For 
maximum Ethernet frame length our implementation and 
advanced UDP/IP core are capable to manage the frame of 
1518 bytes. [9] and [10] does not provide information about 
frame length. The maximum Ethernet speed is lOOOMb/s is 
supported by three designs as shown in table. 


[5] K. S. Siyan and T. Parker, TCP/IP Unleashed , 3rd ed. Sams Publishing, 

August 2002. 

[6] L. Dost'alek and A. Kabelov'a, Understanding Tcp/ip: A Clear And 

Comprehensive Guide. Packt Publishing, April 2006. 

[7] P.-K. Lam and S. Liew, “UDP-liter: an improved UDP protocol for 

realtime multimedia applications over wireless links,” in 1 st 
International Symposium on Wireless Communication Systems, 2004 ., 
Sept. 2004, pp. 314-318. 

[8] San Jose, “Triple Speed Ethernet MagaCore Function User Guide”, 

ALTERA, 101 Innovation Drive, Software Version 9.1, November 

2009. 

[9] A. Dollas, I. Ermis, I. Koidis, I. Zisis, and C. Kachris, “An open TCP/IP 

core for reconfigurable logic,” in 13th Annual IEEE Symposium on 
Field Programmable Custom Computing Machines, 2005. ECCM 
2005. , April 2005, pp. 297-298. 

[10] Nima Moghaddami Khalilzad, Sheida Pourshakour, “FPGA 
implementation of Real-time Ethernet communication using RMII 
Interface”, 3 rd International conference on Communication Software 
and Networks (ICCSN), IEEE 2011. 

N. Alachiotis, S. Berger, and A. Stamatakis, "Efficient PC-FPGA 
Communication over Gigabit Ethernet," Computer and Information 
Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 

2010 . 


V. Conclusion 

This paper shows TSEMAC and Niche networking stack on 
FPGA. Here a UDP/IP stack is implemented and verified in 
Altera Cyclone III. It uses 78% of total logic element of 
FPGA and can operate at maximum frequency of 125 MHz. 
Design uses NIOS II processor to control and configure other 
modules. As resource efficiency is one of the main goal, a 
selected embedded processor with function as a CPU within 
the stack. The design is tested and analyzed in signal tap 
analyzer and wireshark. We implement the UDP/IP 
functionality using Niche stack where IP provides basic 
datagram delivery services, on top of which UDP delivers a 
multiplexing features so that multiple application can 
concurrently use UDP/IP stack. We also compare our design 
with other work in terms of speed, maximum frame length and 
maximum Ethernet speed. The results shows that our design 
gives higher speed among other designs. Other two 
parameters gives intermediate solution. 
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